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(54) PEPTIDE SEQUENCE THAT FORMS MUCIN SUGAR CHAIN AND TECHNIQUE FOR 
MODIFYING PROTEIN TO BE LINKED WITH MUCIN SUGAR CHAIN 

(57) An amino acid sequence whereby a mucin 
sugar chain is specifically introduced into a protein or 
peptide chain, and a technique for introducing a mucin 
sugar chain into a protein or peptide by utilizing the 
amino acid sequence. In a protein or peptide containing 
an amino acid sequence represented by the following 
general formula (I): X(-1)-X(0)-X(1)-X(2)-X(3), the Gal- 
NAc moiety of UDP-CalNAc (wherein UDP represents 
uridine S'-diphosphate, and GalNAc represents N- 
acetylgalactosamtne) is introduced into the amino acid 
X(0) in the presence of a UDP-GalNac: polypeptide a- 
1,0-GalNAc transferase. In said formula (I), X(-1) and 
X(2) represent each independently an arbitrary amino 
acid; X(0) represents T or S; and X(1) and X(3) repre- 
sent each independently an arbitrary amino acid, pro- 
vided at least one of them represents P. 
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Description 

BACKGROU ND OF THF INVENTION 
Field of the Invention 



This invention relates to amino acid sequences with 
which a mucin type sugar chain can be introduced into 
a protein or peptide and also relates to a technique for 
introducing a mucin type sugar chain into a protein or 
peptide by utilizing the sequences. 

Background Art 

Many of the proteins found in animals, plants and 
insects are glycoproteins. A wide variety of roles of the 
sugar chains of glycoprotein has been unveiled in 
recent years. For example, it is known that a sugar chain 
has physiological roles as a ligand in cell adhesion and 
cell recognition as well as a physicochemical role of 
improving the stability and/or solubility of proteins. In 
addition, while glycoproteins such as erythropoietin and 
interferons have been developed as a drug recent 
years, the structure of the sugar chain on the glycopro- 
teins has a great influence on the pharmacokinetics and 
the stability of the drugs in vivo. Although the signifi- 
cance of the sugar chains of glycoprotein has been well 
recognized, no established techniques have been 
known so far for introducing a sugar chain into a specific 
position of a protein in a simple manner. 

For some drugs that are inherently glycoprotein, 
their protein portions are only prepared typically with E 
coN on a mass production basis. When such a protein is 
administered as a drug, the kinetics, stability and anti- 
genicity of the protein in yjyQ sometimes differ from the 
native glycoprotein due to lack of sugar chains. The dif- 
ferences may, by turn, give rise to problems including 
impairment with a large dose and side effect. 

Even proteins produced in animal cells can become 
glycoproteins having sugar chains that are different 
from the native ones. Then, such proteins can also 
entail the problems as mentioned above. 

The above problems and other related problems 
may be dissolved by a technique of introducing a spe- 
cific sugar chain into a specific position of a protein mol- 
ecule. Further, various functional features of sugar 
chains can be selectively introduced into the protein 
with such a technique. Furthermore, the technique will 
show a wide variety of applications in the pharmaceuti- 
cal industry and other industries. 

Two major modes of binding a sugar chain to pro- 
tein have been known; an asparagine linked type sugar 
chain and a mucin type sugar chain. 

According to previous reports, asparagine linked 
type sugar chains attach to a consensus sequence of - ss 
Asn-Xaa-Ser/Thr- (Xaa * Pro). However, it is also 
known that not all the sites having the consensus 
sequence in a protein have an asparagine linked type 
sugar chain. On the other hand, as for the mucin type 
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sugar chain, there have been reports telling that amino 
acids such as Ser, Thr and Pro are frequently observed 
near the binding site. However, no reports have ever 
described the well characteristic features of the 
5 sequence of the binding site. 

The asparagine linked type sugar chain differs from 
the mucin type sugar chain in the biosynthesis of the 
sugar chain. Specifically, the biosynthesis of an aspar- 
agine linked type sugar chain takes place co-transla- 
10 tonally in protein synthesis and then the folding of 
glycoprotein follows it. On the other hand, a mucin type 
sugar chain is introduced post-translationally, i.e., after 
the translation and folding of protein. In addition, as for 
the asparagine linked type sugar chain, it has been 
is reported that a large sugar chain having fourteen mon- 
osaccharides is at a time transferred to a protein and 
recognized and controlled to form a proper protein 
structure by a molecular chaperon called calnexin. How- 
ever, no molecular chaperon is known to date for the 
20 mucin type sugar chains. 

Thus, while common sequence required for glyco- 
sylate of an asparagine linked type sugar chain is well 
known as described above, there is no knowing where 
is the suitable position in a protein for introducing the 
25 sugar chain. In addition, there is no guarantee if the 
mutant protein having the sugar chain shows the same 
three dimensional structure and biological activity as its 
native protein. 

For the mucin type sugar chain, on the other hand 
so it may be safely assumed that the structure of a protein 
molecule is not significantly affected by introduction of a 
sugar chain, because the sugar chain is introduced after 
the protein folding. Therefore, the technique for intro- 
ducing the mucin type sugar chain into a protein may be 
35 very promising for providing protein having functional 
features of sugar chains, while maintaining the protein 
structure and activities unchanged. However, no struc- 
tural features which are highly specific to the glycosyla- 
te site for a mucin type sugar chain have been found. 
40 Therefore, it is impossible to specifically introduce a 
mucin type sugar chain into a certain portion of a pro- 
tein molecule to date. 

While some characteristic aspects of peptide 
sequences around mucin type sugar chain binding sites 
45 are known as will be described hereinafter, an enzyme 
to introduce GalNAc (N-acetylgalactosamine) of a 
mucin type sugar chain into proteins is also known This 
enzyme is called UDP-GalNAc: polypeptide a1 O-Gal- 
NAc transferase (O-GalNAc T). Further, O-GalNAc T is 
so found in colostrum of cow and catalyzes a reaction in 
which GalNAc is transferred into serine or threonine of a 
protein or peptide as follows: 



UDP-GalNAc + protein -» protein-GalNAc + UDP 

wherein UDP represents uridine S'-diphosphate and 
GalNAc represents N-acetylgalactosamine. 

The enzyme is found in a variety of sources. For 
instance, it was found in colostrum of cow by A. Elham- 
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mer et al. [J. Biol. Chem., Vol.261, pp.5249-5255, 
(1986)], in rat ascites hepatoma cells, AH99 by M. Sug- 
iura et al. [J. Biol. Chem., Vol.257, pp.9,501 -9,507 
(1982)], in porcine submaxillary gland by Y Wang et al. 
[J. Biol. Chem., Vol.267, pp. 12709-1 271 6(1 992)], in pig 
trachea by J. M. Cottrell et al. [Biochem. J., Vol.283, 
pp.299-305 (1992)]. Further, the success in cloning the 
genes of the enzyme has been reported [F. L Homa et 
al., J. Biol. Chem., Vol.268, pp. 12609-1 261 6 (1993)]. It 
has been also reported that a large amount of the 
enzyme can be obtained in insect cells and animal cells 
by means of genetic engineering techniques [F L 
Homa et al., Protein Expr. Purif., Vol.6, pp.1 41 -148 
(1995) and S. Wragg et al., J. Biol. Chem., Vol.270, 
pp. 16947-1 6954 (1995)]. 

Although several studies on characteristic aspects 
of peptide sequences binding mucin type sugar chains 
have been reported, they mostly rely on statistic meth- 
ods with which amino acid sequences are analyzed par- 
ticularly at and around the sites where mucin type sugar 
chains are bound. Very few of them deal with the actual 
use of the peptide thus obtained with Q-GalNAc T in 
order to analyze the reactivity of the peptide sequences. 

I. B. H. Wilson et al. compared peptide sequences 
of glycosylation sites of mucin type sugar chains with 
those of non-glycosylation sites. Thus, they reported 
that proline, serine and threonine are frequently found 
at positions between -3 and +3 of each binding site, 
(hereinafter, with regard to the locations of amino acids 
in a peptide sequence, the position to which a sugar 
chain is transferred is denoted as Position 0 and posi- 
tions next to Position 0 and sequentially approaching 
the N-terminal are respectively referred to as Positions - 
1 , -2 and -3 in order, whereas positions next to Position 
0 and sequentially approaching the C-terminal are 
respectively referred to as Positions +1, +2 and +3 in 
order.) Further, proline is found at Positions -1 and +3 
with a relatively high frequency. However, they con- 
cluded that it is difficult to definitely describe the charac- 
teristic features of the sites suitable for binding mucin 
type sugar chains, because the specific sequences they 
found are also found at positions other than the binding 
site. Furthermore, they did not carry out actual experi- 
ments on peptide in order to confirm their statistic find- 
ings, either [Biochme. J., Vol. 275, pp.529-534(1991)]. 

A. A Gooley et al. analyzed the sugar chain binding 
site of a mucin type glycoprotein of rat called CD8 a with 
Edman degradation. Thus, they proposed a motif of 
Xaa-Pro-Xaa-Xaa (where at least one of the Xaa's rep- 
resents threonine binding a mucin type sugar chain) 
that can be used as a consensus sequence for glyco- 
sylation site of a mucin type sugar chain. However, the 
motif is not feasible for a wide scope of application since 
it cannot satisfactorily define the mucin type sugar chain 
binding site of glycoproteins derived from other sources 
[Biochem. Biophys. Res. Commun., Vol.178, pp.1 194- 
1201 (1991)]. Later, they also analyzed human glycoph- 
orin A [Glycobiology, Vol.4, pp.413-417 (1994)] and 
bovine K-casein [Glycobiology, Vol.4, pp.837-844 
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(1994)], which are also mucin type glycoproteins, in a 
similar manner. As a results, they proposed the follow- 
ing four motifs as an extension of the preceding pro- 
posal. In the following, Thr(GalNAc) represents a 
5 threonine residue binding a mucin type sugar chain. 

1 . Xaa-Pro-Xaa-Xaa 

where at least one of the Xaa's represents Thr(Gal- 
NAc), 

10 2. Thr(GalNAc)-Xaa-Xaa-Xaa 

where at least one of the Xaa's represents threo- 
nine, 

3. Xaa-Xaa-Thr(GalNAc)-Xaa 

where at least one of the Xaa's represents lysine or 
is arginine, and 

4. Ser(GalNAc)-Xaa-Xaa-Xaa 

where at least one of the Xaa's represents serine. 

With this extension, however, the motifs do not sat- 

20 isfactorily define the sugar chain introducing site of glyc- 
oproteins of other mucin types. Furthermore, the above 
motifs may cover peptide sequences having no sugar 
chain and hence some limitations have to be defined for 
Xaa's. In addition, they have not verified the motifs by 

25 actually applying them to peptide. 

J. D. Young et al. reported that the activity of a Gal- 
NAc acceptor can be measured in vitro by utilizing O- 
GalNAc T derived from swine submaxillary gland and a 
synthesized peptide as a substrate (Biochemistry, 

30 Vol.18, pp.4444-4448 (1979)]. Their report says that 
TPPP, RTPPR PRTPPR TPRTPPP and VTRTPPP 
which are derived from bovine myelin basic protein are 
highly active GalNAc acceptors and VTRTPPP is the 
most active among them. However, the sequences may 

35 not be feasible as characteristic features in GalNAc 
acceptor because of a small number of studied analo- 
gous peptides. In addition, the fact that proline is found 
at all Positions +1 to +3 can significantly limit the appli- 
cability of these peptides for introducing mucin type 

40 sugar chains into protein or peptide. 

B. O'Connell et al. carried out a statistic analysis on 
sites for binding mucin type sugar chains and predicted 
that the amino acids at Positions -6, -1 and +3 are 
important. They actually synthesized peptides by modi- 

45 tying a peptide having twelve amino acid residues, 
PHMAQVTVGPGL (Positions -6 through +5), which is 
found at and near the site binding a sugar chain of 
human von Willebrand factor, by changing the amino 
acids at Positions -6, -1 and +3 for other three different 

so amino acids (arginine, glutamic acid, proline or isoleu- 
cine) in order to confirm their influence on GalNAc 
acceptor activity. However, they failed to discover char- 
acteristic features of peptide sequence necessary for 
binding mucin type sugar chains and simply concluded 

55 that amino acid substitution at any position can signifi- 
cantly affect the GalNAc acceptor activity [Biochim. Bio- 
phys. Res. Commun., Vol.180, pp. 1024- 1030 (1991)]. 

Later, B. O'Connell et al. prepared peptides by sub- 
stituting the amino acids at Positions -6 through +5 of 



3 



BNSDOCID:<EP 0754703A1> 



5 



EP 0 754 703 A1 



the same peptide with five amino acids (arginine, 
glutamic acid, proline, isoleucine and alanine) and stud- 
ied their influence on the GalNAc acceptor activity. As a 
result, they have reported that the activity is adversely 
affected when the amino acids at Positions +3, -3 and - 
2 are substituted by different amino acids or the amino 
acid at Position -1 is substituted by an electrically 
charged amino acid, while the remaining positions have 
little influence on the GalNAc acceptor activity. These 
results are inconsistent with their previous report above. 
This indicates that the statistic analysis has little to do 
with the actual binding reactivity of mucin type sugar 
chain. They studied the position at which any amino 
acid substitution decreased the activity. Therefore, they 
failed to show what amino acids can favorably be used 
for such substitution in general because they used only 
limited amino acids. Thus, they could not draw general 
conclusions on peptide sequences binding mucin type 
sugar chains [J. Biol. Chem., Vol.267, pp.25010-25018 
(1992)]. 

Ake P. Elhammer et al. statistically analyzed Posi- 
tions -4 through +4 of peptides and prepared an algo- 
rithm to support a theory that the peptide sequence is 
not important in terms of sites for binding mucin type 
sugar chains so long as the binding site and its neigh- 
borhood comprise serine, threonine, proline, alanine 
and/or glycine. Further, they proposed PPASTSAPG as 
a possible ideal peptide sequence for introducing a 
mucin type sugar chain. The proposed peptide actually 
had the highest degree of GalNAc acceptor activity 
when compared with other four peptide sequences 
including RTPPP. However, since the comparison was 
limited only to four types of peptide sequences contain- 
ing similar sequences, there remains a doubt that the 
proposed sequence is really ideal. In addition, it was 
shown that GalNAc can not be introduced into a protein 
having a large number of sites which are to be a binding 
site according to their algorithm. Therefore, the pro- 
posed sequence will not feasibly be used for introducing 
mucin type sugar chains into a variety of protein [J. Biol. 
Chem., Vol.268, pp. 10029-1 0038 (1993)]. 

Glycoprotein "mucin", after which the word of mucin 
type sugar chain named, comprises a region where 
amino acid sequences having 20 to 30 residues are 
repeated in tandem. It is also known that a large number 
of mucin type sugar chains are bound in that region. I. 
Mishimori et al. prepared various peptides analogous to 
the repetitive region of human mucin MUC1 and studied 
the GalNAcT acceptor activity by using a crude enzyme 
solution of Q-GalNAc T extracted from human breast 
cancer cells MCF7 [J. Biol. Chem., Vol.269, pp.16123- 
16130 (1994)]. They reported that, as a result of their 
study, the peptide region essentially required for the 
GalNAc acceptor covers Positions -1 through +4 and, 
further, the proline at Position +3 can accelerate the i 
transfer of GalNAc. On the other hand, they empha- 
sized that the proline at Position +3 alone cannot pro- 
vide any sufficient GalNAc acceptor activity because no 
transfer of GalNAc is observed on PDTRPAPGS, 



PDTRPPAGS and PDTRAPPGS. Although they pre- 
sumed that aspartic acid and arginine at Positions -1 
and +1 provide major factors that obstruct the transfer, 
they did not carry out any experiment to confirm their 
s theory. Therefore, it is difficult to fully realize the charac- 
teristic features of peptide sequence that provide the 
GalNAc acceptor activity from their conclusions. 

As examples of preparation of mucin type glycopro- 
tein by introducing sequences for binding mucin type 
10 sugar chain into protein with genetic engineering, E. 
Gravenhorst et al. reported that they introduced mucin 
type sugar chains into interleukin 2 (IL-2) [Eur. J. Bio- 
chem, Vol.215, pp.189-197 (1993)]. They initially tried to 
introduce GGKAPTSSSTKGG, which included a 
is sequence found on the periphery of the site binding a 
mucin type sugar chain in IL-2, between the 80th and 
the 81st amino acids of IL-2 and express the sequence 
in an insect cell but they failed. Thereafter, they suc- 
ceeded in introducing a sugar chain by using GGKAP- 
20 TPPPKGG where all the serine residues of the above 
sequence were changed with proline residues. How- 
ever, the sequence may have been obtained by mere 
chance as a result of trial and error process. Further- 
more, the peptide sequence may not find a wide appli- 
25 cability in view of the fact that the peptide sequence was 
long and contained as many as twelve residues, of 
which four were proline residues capable of greatly 
influencing the configuration of protein. 

As described above, the characteristic features of 
30 peptide sequence for binding mucin type sugar chains 
can greatly vary depending on the selected population 
and the techniques used for statistic processings where 
statistic analysis is involved. This is probably because it 
is technically very difficult to definitely know the sites 
35 biding mucin type sugar chains in natural proteins and 
hence quite a limited number of sites for binding sugar 
chains have been identified so far. In many cases, the 
characteristic features obtained for certain peptide 
sequences can differ from sequences found in native 
to proteins having GalNAc acceptor activity and, therefore, 
may not be accurate. 

On the other hand, a small number of reports on the 
GalNAc acceptor activity of synthesized peptide are 
known. The number of peptides that have been ana- 
ts lyzed so far is limited simply because peptide synthesis 
Dfir S£ is difficult. Most of the peptide sequences that 
have been studied are relatively long ones having ten or 
more residues and the characteristic features of short 
peptide sequences that can suitably be used for intro- 
'o ducing mucin type sugar chains are to be unveiled. 

Thus, it is very difficult to intentionally introduce a 
mucin type sugar chain into protein on the basis of the 
previously described characteristic features of peptide 
sequence that have been reported. Furthermore, since 
s peptide sequences that are believed to be suitable for 
introducing mucin type sugar chains are relatively long, 
a mucin type sugar chain cannot be introduced simply 
by slightly modifying a protein or peptide. Therefore, the 
scope of utilization of the peptide sequences has been 
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very limited. 

In the meantime, it might be possible to chemically 
synthesize a protein binding sugar chains. However, as 
may be understood by looking at natural mucin glyco- 
proteins, sugar chains have to be bound only to specific 
residues among a large number of serine or threonine 
residues in protein. It is thus extremely difficult to selec- 
tively and efficiently introduce a sugar chain into a spe- 
cific site among a large number of serine and threonine 
residues in a protein or peptide by means of any known 
techniques for organic synthesis. 

SUMMARY OF THE INVENTION 

As a result of research efforts on peptide 
sequences as a site suitable for binding mucin type 
sugar chains, we have now found short peptide 
sequences into which mucin type sugar chains can be 
introduced efficiently. The present invention is based on 
this finding. 

Therefore, an object of the present invention is to 
provide peptide sequences as a binding site into which 
mucin type sugar chains can be introduced. 

Another object of the present invention is to provide 
peptide sequences into which mucin type sugar chains 
can be introduced by means of a catalytic activity of Q- 
GalNAc T 

Still another object of the present invention is to 
provide a technique capable of introducing mucin type 
sugar chains into a protein by minimum mutation of the 
protein without damaging its functional features. 

A protein or peptide according to the invention com- 
prises a sequence represented by the following formula 
(I): 

X(-1)-X(0)-X(1)-X(2)-X(3) (I) 

wherein 

X(-1) and X(2) represent independently any amino 
acid, 

X(0) represents threonine (T) or serine (S), 
X(1) and (3) represent independently any amino 
acid provided that at least one of X(1) and X(3) rep- 
resents proline (P). 

The protein or peptide represented by the above 
formula can function as a substrate for UDP-GalNAc: 
polypeptide a1,Q-GalNAc transferase (O-GalNAc T) 
and can be used for introducing GalNAc into a protein or 
peptide. 

According to the present invention, there is also 
provided a method of introducing GalNAc into a protein 
or peptide, comprising the steps of providing a protein 
or peptide including a sequence represented by formula 
(I), and reacting the protein or peptide as a substrate 
with UDP-GalNAc (where UDP represents uridine 5'- 
diphosphate and GalNAc represents N -acetylgalactos- 
amine) in the presence of UDP-GalNAc: polypeptide 



a1 ,Q-GalNAc transferase (O-GalNAc T). 

According to the present invention, there is also 
provided a method of preparing a glycoprotein having 
mucin type sugar chains, comprising the steps of pro- 
5 viding a DNA coding for a protein or peptide containing 
a sequence represented by formula (I) and secretory 
expressing the DNA in eucaryotic cells. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 

FIG. 1 is a graph showing the GalNAc acceptor 
activity of various peptides. 

FIG. 2 is a graph showing which amino acid, T or S, 
to be suitable as the site for binding a sugar chain in 
75 GalNAc transfer. 

FIG. 3 is a graph showing the GalNAc acceptor 
activities of various peptides having a single proline res- 
idue. 

FIG. 4 is a graph showing the GalNAc acceptor 
20 activities of various peptides having two proline resi- 
dues. 

FIG. 5 is a graph showing the GalNAc acceptor 
activities of various peptides having three proline resi- 
dues. 

25 FIG. 6 is a graph showing the GalNAc acceptor 
activities of various peptides having different numbers 
of N-terminal amino acid residues. ■ 

FIG. 7 is a graph showing the GalNAc acceptor 
activities of various peptides, where X(2) is substituted 

30 with various amino acids. 

FIG. 8 is a graph showing the GalNAc acceptor 
activities of various peptides, where X(2) is substituted 
with an L- or D-optical isomer. 

FIG. 9 is a graph showing the GalNAc acceptor 

35 activities of various peptides, where X(-1) is substituted 
with various amino acids. 

FIG. 10 is a graph showing the GalNAc acceptor 
activities of various peptides, where various amino 
acids are added to Position +4. 

40 FIG. 11 illustrates a construction of expression 
plasmid pGEX-3X Muc C1 which is used for the produc- 
tion of protein GST-3X Muc C1 including a peptide 
sequence having GalNAc acceptor activity at the C-ter- 
minal region of protein GST 

45 FIG. 12 is a graph showing the amount of GalNAc 
transferred to GST-3X Muc C1 including a peptide 
sequence having GalNAc acceptor activity at the C-ter- 
minal region of protein GST and controls of protein hav- 
ing no such peptide sequence. 

so FIG. 13 illustrates a construction of expression 
plasmid pGEY-3X 2A Muc N1 which is used for the pro- 
duction of protein GST-3X 2A Muc N1 including a pep- 
tide sequence having GalNAc acceptor activity at the N- 
terminal side of protein GST. The illustration is complete 

55 with FIG. 14. 

FIG. 14 illustrates a construction of expression 
plasmid pGEY-3X 2A Muc N1 which is used for the pro- 
duction of protein GST-3X 2A Muc N1 having a peptide 
sequence with GalNAc acceptor activity at the N-termi- 
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nal side of protein GST. 

FIG. 15 is a graph showing the amount of Gal N Ac 
transferred to GST-3X 2A Muc N1 including a peptide 
sequence having Gal N Ac acceptor activity at the N-ter- 
minal side of protein GST and controls of protein having s 
no such peptide sequence. 

FIG. 16 illustrates a construction of expression 
plasmid pGEX-3XS which is used for the production of 
a GST mutant including a peptide sequence having Gal- 
NAc acceptor activity at the C-terminal side of protein w 
GST. 

FIGS. 17A, 17B and 17C are charts showing the 
structures of plasmids pGEX-3XS Muc C2, pGEX-3XS 
Muc C3 and pGEX-3XS Muc C4 for the expression of 
proteins GST-3X Muc C2, GST-3X Muc C3 and GST-3X 15 
Muc C4, respectively. 

FIG. 18 is a graph showing the amount of Gal N Ac 
transferred to proteins GST-3X, GST-3X Muc C1 GST- 
3X Muc C2, GST-3X Muc C3, GST-3X Muc C4 and GST- 
3X2AMucN1. 

20 

FIGS. 19A and 19B illustrate the structures of plas- 
mids pSEGST-3X and pSEGST-3X Muc C1 for the 
secretory expression of EGST-3X and EGST-3X Muc 
C1 in COS7 cells. 

FIG. 20 is a graph showing the result of an analysis 25 
of glycosidase treatments of proteins EGST-3X and 
EGST-3X Muc C1 obtained by secretory expression in 
COS7 cells. 

FIG. 21 is a graph showing the result of a lectin blot- 
ting analysis of proteins EGST-3X and EGST-3X Muc 
C1 obtained by secretory expression in COS7 cells. 

FIG. 22 illustrates the structures of plasmids 
PEEGST-3X, pEEGST-3X Muc C1, pEEGST-3X Muc C2 
and pEEGST-3X Muc C3, pEEGST-3X Muc C4 and 
PEEGST-3X Muc C5 for the production of proteins 
EGST-3X, EGST-3X Muc C1, EGST-3X Muc C2 EGST- 
3X Muc C3, EGST-3X Muc C4 and EGST-3X Muc C5 
respectively. 

FIG. 23 is a chart showing the result of an SDS- 
PAGE analysis of proteins EGST-3X, EGST-3X Muc C1 40 
EGST-3X Muc C2, EGST-3X Muc C3, EGST-3X Muc C4 
and EGST-3X Muc C5 obtained by secretory expression 
in COS7 cells. 
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I: 
S 
T 
D 
E; 
N 
Q 
K 
R 
C 
M 
F: 
Y: 
W 
H: 
P: 



leucine 

isoleucine 

serine 

threonine 

aspartic acid 

glutamic acid 

asparagine 

glutamine 

lysine 

arginine 

cysteine 

methionine 

phenylalanine 

tyrosine 

tryptophan 

histidine 

proline 
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DETAILED DESCRIPTION Q F THE INIVFMTin M 

Arabic numerals prefixed by V or "-" herein repre- 
sent a position of an amino acid in a peptide. The posi- 
tion of amino acid where a sugar chain is transferred is 
called Position 0 and the positions on the N-terminal 
side and those on the C-terminal side are referred to as 
Positions -1, -2 and -3 and Positions +1, +2 and +3 in 
order. 

Amino acids are herein represented by the known 
respective single character codes as shown below. 

G: glycine 
A: alanine 
V: valine 



45 



According to the first aspect of the present inven- 
tion, a peptide represented by formula (I) is provided. 
The peptide having a sequence represented by formula 
(I) functions efficiently as a substrate for enzyme Q-Gal- 
NAc T In other words, it shows a high Gal N Ac acceptor 
activity. Therefore, Gal N Ac can be efficiently introduced 
into the peptide by reacting the peptide with UDP-Gal- 
NAc in the presence of Q-GalNAc T In this context, Gal- 
NAc is introduced to T or S at X(0). 

Gal N Ac can be also introduced into a protein effi- 
ciently by reacting the peptide with UDP-GalNAc in the 
presence of Q-GalNAc T, wherein the protein includes 
the peptide sequence represented by formula (I). Thus, 
according to the present invention, there is also pro- 
vided a protein including a peptide sequence repre- 
sented by formula (I). 

A glycoprotein with mucin type sugar chains can be 
efficiently produced by secretory expressing a protein 
including a peptide sequence represented by formula (I) 
in an eucaryotic cell. 

According to the preferred embodiment of the 
present invention, there is provided a protein or peptide 
of formula (I) having a high degree of GalNAc acceptor 
activity wherein X(1) and (3) represent P or A and either 
one of them is P. 

According to another preferred embodiment of the 
present invention, there is provided a sequence having 
a high degree of GalNAc acceptor activity represented 
by the following formula (la): 



so 



55 



X(-1)-X(0)-P-X(2)-P 



da) 



where X(-1), X(0) and X(2) are as defined above. 

In this embodiment, X(-1) preferably represents an 
amino acid selected from Y, A, W, S, G, V, F, T and I and 
X(2) preferably represents an amino acid selected from 
A, P, C, K, R t H, S, M, T, Q, V, I, Land E. 

According to another preferred embodiment of the 
present invention, there is provided a sequence show- 
ing a high degree of GalNAc acceptor activity repre- 
sented by the following formula (lb): 
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X(-1)-X(0)-X(1)-X(2)-P (lb) 

where X(-1), X(0), X(1) and X(2) are as defined above. 

In this embodiment, X(-1) preferably represents an 
amino acid selected from Y, A, R W, S, G, V, F, T and I s 
and X(2) preferably represents an amino acid selected 
from A, P, C, K, R, H ( S, M, I Q, V, I, L and E. More pref- 
erably, X(-1) represents A or P, and X(1) and X(2) repre- 
sent A. 

According to another embodiment of the present io 
invention, there is provided a sequence showing a high 
degree of Gal N Ac acceptor activity represented by the 
following formula (Ic): 

X(-1)-X(0)-P-X(2)-X(3) (Ic) 15 

where X(-1), X(0), X(2) and X(3) are as defined above. 

In this embodiment, X(-1) preferably represents an 
amino acid selected from Y, A, P, W, S, G, V, F, T and I 
and X(2) preferably represents an amino acid selected 20 
from A, P, C, K, R, H, S, M, T, Q, V, I, L and E. More pref- 
erably, X(-1) represents A or P, and X(2) and X(3) repre- 
sent A. 

The peptide or protein represented by formulae (la) 
and (lb) is preferable. 25 

According to another preferred embodiment of the 
present invention, X(0) represents T. In some cases, 
when X(0) is T, the GalNAc acceptor activity is more 
than about 50 times greater than its counterpart when 
X(0) is S. 30 

According to another preferred embodiment of the 
present invention, when one or two amino acids exist 
away from X(-1) on the N-terminal side, each of the 
amino acids is preferable A, P, G, E, Q, T, R or D. When 
one or more amino acids exist away from X(3) on the C- 35 
terminal side, they may be any amino acids. 

While a peptide or amino acid according to the 
invention may be either an L- or D-isomer, it is prefera- 
bly an L-isomer. In particular, the amino acid of X(2) is 
preferably an L-isomer. 40 

Specific examples of a peptide sequence repre- 
sented by formula (I) include the following. A protein 
according to the invention preferably contains a peptide 
sequence selected from the following. 

ATPAP 45 

AATPAP 

AAATPAA 

AAATAAP 

PAATAAP 

APATAAP so 

AAPTAAP 

AAATAPP 

PAATPAP 

APATPAP 

AAXaTPXbP 55 
where Xa and Xb represent any amino acid provided 
that either Xa or Xb represents A, and 

AAATPAPXc 
where Xc represents any amino acid. 



A protein or peptide according to the invention may 
be synthesized by means of a technique of genetic 
engineering or by chemical synthesis. 

According to another aspect of the present inven- 
tion, there is provided a method of introducing mucin 
type sugar chains into a protein or peptide. 

The introduction of sugar or sugar chains into a pro- 
tein or peptide according to the present invention is con- 
ducted in a manner as described below. When the 
introduction is conducted in vitro, a protein or peptide 
according to the invention is prepared and reacted with 
UDP-GalNAc as a sugar donor in the presence of O- 
GalNAc T preferably in a buffer solution containing 
MnCI 2 or Triton X-100. The concentrations of the sugar 
donor and the sugar acceptor may be used without lim- 
itation up to a saturated state. While the enzyme Q-Gal- 
NAc T may also be used without limitation, it is 
preferably used at the rate of about 1 0 mU to 10 U per 1 
milliliter of the reactive solution. The pH of the buffer 
solution is preferably about 7. The use of imidazole- 
hydrochloric acid buffer solution having a pH value of 
about 7.2 is preferable. The reaction is conducted in 
general at 25 to 37°C and completes in several minutes 
to tens of several hours depending on the conditions. 

According to the present invention, sugar chains 
may be introduced into a protein or peptide by means of 
a biosynthesis pathway of eucaryotic cells having the 
enzyme of Q-GalNAc T. More specifically, a protein or 
peptide having mucin type sugar chains can be 
obtained by secretory expressing a protein or peptide 
according to the present invention in an eucaryotic cell 
having the enzyme of O-GalNAc T It may be safe to 
presume that the introduced sugar chain is bound to 
amino acid X(0) of the sequence represented by for- 
mula (I). Preferable examples of eucaryotic cells having 
O-GalNAc T includes animal cells such as COS7, 
COS1, BHK, C127 and CHO and insect cells such as 
Sf9andSf21. 

In the present invention, when sugar chains are 
introduced by means of a biosynthesis pathway of 
eucaryotic cells, the protein has to be secreted out of 
the cells. Therefore, when the protein or peptide into 
which sugar chains are to be introduced can not be eas- 
ily secreted out of the eucaryotic cell, it is preferable that 
the peptide is expressed as a precursor having a signal 
peptide attached thereto. By the secretion of the protein 
from the cell, sugar chains can be introduced and the 
intended protein can be obtained as mature protein. 

Known available technique of genetic engineering 
can be used for secretory expressing a protein or pep- 
tide according to the present invention. It may be obvi- 
ous to one skilled in the art that a protein or peptide 
having a sequence as represented by formula (I) can be 
expressed in cells as described above. A sequence rep- 
resented by formula (I) can be inserted or added to a 
desired position of a protein or peptide, or can replaced 
at the desired position of the protein or peptide with a 
sequence represented by formula (I). 

More specifically, according to the present inven- 
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tion, there is provided a method of preparing a protein or 
a peptide having a mucin type sugar chain comprising 
the steps of: 

transforming an eucaryotic cell with a DNA coding 
for a protein or peptide according to the invention- 
and 

expressing the protein or peptide in the transformed 
cells and secreting the protein or peptide from the 
eucaryotic cell. 

According to another aspect of the invention, there 
is provided a method of introducing a mucin type sugar 
chain into a desired position of a protein or peptide of 
interest comprising the steps of: 

inserting or adding a DNA coding for a sequence 
represented by formula (I) into a position which is in 
a DNA coding for the protein or peptide of interest 
and is corresponding to the position where a mucin 
type sugar chain is intended to be introduced, or 
replacing a partial DNA fragment including the posi- 
tion with a DNA coding for a sequence represented 
by formula (I), thereby a DNA coding for a protein or 
peptide containing the DNA coding for the 
sequence represented by formula (I) is obtained; 
transforming an eucaryotic cell with the DNA 
obtained in the above step; and 
expressing the protein or peptide in the transformed 
cell and secreting the protein or peptide having a 
mucin type sugar chain from the cell. 

Preferably, the DNA coding for a protein or peptide 
containing the DNA coding for the sequence repre- 
sented by formula (I) is preferably in the form of a vector. 
More preferably, it is in the form of an expression vector 
including various sequences for promoting or regulating 
the expression. Without undue experiment one skilled 
in the art can select a vector suitable for the present 
invention from a group of vectors used in the field of 
genetic engineering and also can construct an expres- 
sion vector which are useful in the present invention. As 
described above, an eucaryotic cell to be used in the 
present invention may be a cell having O-GalNAc T. In 
addition, vectors that can be used for such cells and can 
suitably be used in the present invention are known 
(See, Molecular Cloning [J- Sambrook et al. ( Cold 
Spring Harbor Laboratory Press (1989)] and Baculovi- 
rus Expression Vectors: a laboratory manual [D. R. 
O'Reilly et al. ( W. H. Freeman and Company (1992).]) 

Thus, according to the present invention, there is 
provided a DNA sequence coding for a sequence repre- 
sented by formula (I) along with a DNA sequence cod- 
ing for a protein or peptide including a sequence 
represented by formula (I). 

A glycoprotein or glycopeptide produced by a 
method according to the present invention can be easily 
isolated and purified from the solution after the reaction 
by using a known appropriate technique. The tech- 



niques include affinity column chromatography, gel fil- 
tration column chromatography and reversed' phase 
column chromatography. The reaction product can be 
collected by condensation and/or lyophilization. 
5 With a method of introducing sugar chains accord- 
ing to the present invention, a mucin type sugar chain 
can be introduced into a desired position of a protein or 
peptide of which structures are known. Furthermore, a 
sequence represented by formula (I) consisting of only 
10 five amino acids. With this short sequence, a sugar 
chain can be highly probably introduced into a protein or 
peptide having such a short sequence. Thus, according 
to the present invention, a protein or peptide chain can 
advantageously be modified to be linked with mucin 
is type sugar chains without affecting the structure of the 
protein or peptide of which structures are already 
known. A technique according to the present invention 
will find a wide variety of applications in the pharmaceu- 
tical industry and other industries. 
20 in addition, a glycoprotein or glycopeptide prepared 
according to the present invention may be used as a 
substrate for a variety of glycosidases and glycosyl- 
transferases. Therefore, the protein can be used for 
detecting useful enzymes. For instance, it can advanta- 
25 geously be used for preparing substrates of enzymes 
that can take part in the formation of mucin type sugar 
chains. More specifically, AAAT(ct-Gal N Ac) PAP that can 
be obtained by using AAATPAP can be used for detect- 
ing and measuring a-N-acetylgalactosaminidase or 
so UDP-Gal :GalNAc-polypeptide pi ,3-galactosyltrans- 
ferase in samples that may be derived from living things 
including microorganisms, insects, animals, plants and 
their cell culture solutions. 

Furthermore, a carrier having peptides can be pre- 
35 pared by providing a peptide according to the present 
invention and thereafter binding it to an activated car- 
boxylagarose or cyanogen bromide activated agarose. 
Such a carrier having peptides can advantageously be 
used for purifying mucin type glycosyltransferase such 
40 as Q-GalNAc T 

EXAMPLES 

The present invention will be described in detail by 
45 way of examples, which by no means limits the scope of 
the invention. 

In the following examples, the purification of Q-Gal- 
NAc T, the peptide synthesis and the measurement of 
the GalNAc acceptor activity of the synthesized peptide 
so were carried out in a manner as described below. 

Purification pf p-fiftl^ J 



Q-GalNAc T derived from colostrum of cow was 
55 purified by a method reported by A. Elhammer et al [J 
Biol. Chem., Vol.261, pp.5249-5255 (1986)]. 

More specifically, an about 4.8 liters of colostrum of 
cow was centrifuged to remove butter component and 
then subjected to an ultracentrifugation. A 800 milliliters 
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of glycerol was added to the obtained 3.2 liters solution 
to produce a crude enzyme solution, which was subse- 
quently subjected to a 4-step purifying process includ- 
ing DEAE-Sephacel column chromatography, 
ultrafiltration and apomucin-Sepharose 4B column 5 
chromatography I and II to produce highly but partially 
purified preparations. 

Synthesis of Peptide 

Peptides were synthesized by Fmoc solid phase 
method [N. Izumiya et al., Bases and Experiments on 
Peptide Synthesis, Maruzen, 1985] with PS3 Automatic 
Peptide Synthesizer available from Protein Technolo- 
gies. The synthesized peptide was analyzed in structure 
and quantified by a mass spectroscopy (Mass Analyzer 
API III: available from Parkin Elmer) and amino acid 
composition analysis (Amino Acid Composition Ana- 
lyzer JLC-300: available from Nippon-Denshi). 

Measurement of GalNAc Acceptor Activity of Peptide 

The GalNAc acceptor activity of peptide was deter- 
mined by a method proposed by J. M. Cottrell et al. [Bio- 
chem. J., Vol.283, pp.299-305 (1992)] and A. R 
Elhammer et al. [J. Biol. Chem., Vol.268, pp. 10029- 
10038 (1993)]. 

More specifically, 50 jal of a reaction solution (50 
mM Imidazole-HCI (pH 7.2), 10mM MnCI 2 , 0.5% Triton 
X-100, 150 nM UDP-[ 3 H]GalNAc) containing 100 nmol 
of a synthesized peptide and 0.5 to 500 mU of partially 
purified O-GalNAc T derived from colostrum of cow was 
prepared and warmed appropriately at 37°C for 30 min- 
utes to 5 hours. The reaction was terminated by adding 
50 ^l of 100 mM of EDTA and, subsequently, put into an 
1 ml ion exchange column (AG1-X8, Cl form: available 
form Japan Bio-Rad laboratory) and the reaction prod- 
uct was eluted with 2.5 ml of water. A 10 ml of a cocktail 
for liquid scintillation counters (Atomlight: available from 
Dupont) was added to the eluted fraction and measured 
radioactivity by a liquid scintillation counter for 2 min- 
utes. 

The GalNAc acceptor activity was expressed in 
terms of the initial reaction velocity relative to a peptide 
PPASTSAPG which was set to 100%. 

Example 1 : GalNAc transfer into various peptides 

The GalNAc acceptor activity of each of the pep- 
tides listed in FIG. 1 was measured. 

These peptide sequences are either those derived 
from mucin type glycoproteins or those that are already 
known as GalNAc acceptors with Q-GalNAc T. More 
specifically, PGGSATPQ, SGGSGTPG, GEPTSTP, 
PDAASAAP and ALQPTQGA are respectively derived 
from mucin of swine submaxillary gland, mucin of sheep 
submaxillary gland, bovine K-casein, human erythropoi- 
etin and human granulocyte colony stimulating factor. 
RTPPP and VTRTPPP are derived from bovine myelin 



and J. D. Young et al. reported that GalNAc was trans- 
ferred to these peptides [Biochemistry; Vol.18, pp. 4444- 
4448 (1979)]. With regard to PPASTSAPG, A. P. Elham- 
mer et al. reported that GalNAc was transferred to this 
peptide [J. Biol. Chem.; Vol.268, pp. 10029- 10038 
(1993)]. 

The results are shown in FIG. 1. As seen from FIG. 
1, PPASTSAPG, which may be an ideal peptide 
sequence according to A. P. Elhammer et al., showed 
the highest activity of all, followed by RTPPP and 
VTRTPPP On the other hand, all the peptide 
sequences derived from natural mucin type proteins 
showed a low activity. And, no GalNAc transfer was 
observed in the peptide sequences derived respectively 
from the submaxillary gland mucins. 

Example 2 : Influence of amino acids, threonine and ser- 
ine, at the binding site of a sugar chain 

A mucin type sugar chain is bound to threonine or 
serine. Therefore, the preferance of the amino acids to 
a GalNAc transfer was tested for comparison. 

Among the peptides listed in Example 1 , the pep- 
tides derived from erythropoietin and myelin which con- 
tain only one serine or threonine residue were used. 
They clearly showed a GalNAc transfer more than any 
other peptides used in Example 1. Then, PDAATAAP 
and PDAASAAP derived from erythropoietin and 
RTPPP and RSPPP derived from myelin were pre- 
pared. 

FIG. 2 shows the results. As seen from FIG. 2, thre- 
onine was 40 to 50 times more active than serine in 
each case. PDAATAAP showed an activity level about 4 
times higher than that of PPASTSAPG which showed 
the highest activity level in Example 1 . 

Example 3 : GalNAc transfer to peptides containing a 
single proline residue 

The influence of a proline residue in a peptide in the 
GalNAc transfer reaction was then examined, because 
proline was relatively frequently observed in the periph- 
eral sequence of the amino acid where a mucin type 
sugar chain was bound and because each of the pep- 
tides that showed a high GalNAc acceptor activity in 
Examples 1 and 2 contained several proline residues. 
As the first step, a single proline residue was replaced 
with one alanine residue in AAATAAA to prepare various 
peptides and GalNAc acceptor activities of the peptides 
were compared. The prepared peptides were 
AAATAAA, PAATAAA, APATAAA, AAPTAAA, AAATPAA, 
AAATAPA and AAATAAR 

The GalNAc acceptor activity of each of the pep- 
tides is shown in FIG. 3. While AAATAAA scarcely 
showed any activity, AAATPAA and AAATAAP showed a 
significantly high level activity. Consequently, it was 
found that the existence of proline at Position +1 and +3, 
particularly at Position +3, is important for GalNAc 
acceptor activity. AAATAAP showed an activity level 
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about twice as high as that of PPASTSAPG. 



Example 4 : Gal N Ac transfer to peptides containing two 
proline residues 

The results of Example 3 showed that a significant 
effect can be produced by introducing a single proline 
residue into a specific site. Therefore, in this Example, a 
second proline residue was introduced to each alanine 
position of AAATAAP which showed the highest activity 
level in Example 3 in order to find out the effect of the 
second proline residue. The peptides prepared for com- 
paring the GalNAc acceptor activities were AAATAAA 
AAATAAP, PAATAAP, APATAAR AAPTAAR AAATPAP 
and AAATAPP 

FIG. 4 illustrates the results. As seen from FIG. 4, 
AAATPAP showed a acceptor activity level about three 
times higher than that of AAATAAP. This proves that the 
GalNAc transfer to peptide was synergistically pro- 
moted when two proline residues were provided at both 
Positions +1 and +3. AAATPAP showed an activity level 
about seven times as high as that of PPASTSAPG. The 
effect of the proline at Position +3 remained stable even 
when the alanine residues at Positions -3, -2, -1 and +2 
were switched to proline. 

Example 5 : GalNAc transfer to peptides containing 
three proline residues 

As proved in Example 4, the two proline residues at 
Positions +1 and +3 in a peptide sequence greatly 
improves the GalNAc acceptor activity. In this Example, 
therefore, a third proline residue was introduced to each 
alanine position of AAATPAP in order to find out the 
effect of the third proline residue. The peptides pre- 
pared for comparing the GalNAc acceptor activities 
were AAATAAP, AAATPAP, AATPAR APATPAP AAPT- 
PAP and AAATPPP. 

FIG. 5 illustrates the results. As seen from FIG. 5, 
the GalNAc acceptor activity level did not significantly 
increase when a proline residue was introduced at a 
position other than +1 and +3. This clearly indicates that 
two proline residues at Position +1 and +3 are important 
for the GalNAc acceptor activity. While the third proline 
introduced into any of Positions -3, -2 and +2 did not 
show any significant change in the activity, the level 
decreased remarkably when it was introduced into Posi- 
tion -1 . From these results, it indicates that the effect of 
the two proline residues at Positions +1 and +3 was not 
basically affected by the amino acids at the remaining 
positions. It also suggests that, when proline residues 
which are unique amino acids are located on the both 
sides (-1 and +1) of the threonine to which GalNAc is 
transferred, a unusual peptide structure may be formed 
to reduce the activity. 

In view of the results of Examples 3 to 5, it is clear 
that the requirement for a peptide to accept GalNAc is 
not that one or more proline residues exists at random 
but that they should be located at specific positions. Fur- 



ther, the requirement for a peptide to have higher Gal- 
NAc acceptor activity is not that the number of proline 
residues around serine or threonine to which GalNAc is 
transferred is merely increased but that they are located 
specific and limited positions. 

Example 6 : GalNAc transfer to peptides with different 
numbers of amino acids on the N-terminal side of the 
binding site 
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In view of the results of Examples 3 to 5, it is also 
clear that the amino acids located at Positions from -3 to 
-1 of the N-terminal side do not significantly affect the 
GalNAc acceptor activity regardless if they are alanine 

15 or proline. Therefore, peptides with different numbers of 
amino acid residues on the N-terminal side were pre- 
pared and tested for GalNAc acceptor activity in order to 
find out if the amino acid residues at Positions from -3 to 
-1 were really necessary for GalNAc acceptor activity. 

20 The peptides prepared were AAATPAP, AATPAP ATPAP 
and TPAR 

FIG. 6 illustrates the results. As seen from FIG. 6 
while ATPAP showed a high GalNAc acceptor activity 
but that of TPAP was dramatically low. Thus, it was 

25 proved that, in order to obtain a high GalNAc acceptor 
activity, at least one amino acid residue is required on 
the N-terminal side of threonine or serine to which Gal- 
NAc is transferred. At the same time, it was also found 
that the presence of two or more amino acids on the N- 

30 terminal side of threonine is preferable but not essential 
because the GalNAc acceptor activity increases 
depending on the number of amino acids on the N-ter- 
minal side up to three amino acid residues. 



35 



Example 7 : GalNAc transfer to peptides with different 
amino acids at position +2 



The results of Example 6 exhibited that peptide 
sequences having ATPAP in common show a high level 
40 GalNAc acceptor activity. This peptide has two proline 
residues. Since proline has a unique structure among 
various amino acids, it was quite probable that the 
amino acid residue located between the two proline res- 
idues, at Position +2, could significantly affect the pep- 
45 tide structure. Thus, 20 peptides having different amino 
acids at Position +2 were prepared and compared for 
GalNAc acceptor activity. The prepared peptides were 
AAATPAP, AAATPPP, AAATPCP, AAATPKP, AAATPRP, 
AAATPHP, AAATPSP, AAATPMP, AAATPTP AAATPQR 
so AAATPVR AAATPIR AAATPLR AAATPER AAATPGR 
AAATPYR AAATPWR AAATPFR AAATPNP and 
AAATPDP. 

FIG. 7 illustrates the results. As seen from FIG. 7, 
the peptide shows a high GalNAc acceptor activity gen- 
55 erally if proline exists at Position +1 and +3 regardless 
of a side chain of the amino acid at Position +2. On the 
other hand, the activity may vary depending on the side 
chain of the amino acid at Position +2. In particular, 
each peptide having alanine, proline, cysteine, lysine, 
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arginine, histidine, serine, methionine, threonine, 
glutamine, valine, isoleucine, leucine or glutamic acid 
shows a higher Gal N Ac acceptor activity than the pep- 
tide having glycine, tyrosine, tryptophan, phenylalanine, 
asparagine or aspartic acid. The results prove that the 5 
amino acid at Position +2 preferably has a relatively 
small side chain and a positive charge. 

Example 8 : GalNAc transfer to peptides with optical Iso- 
mers of amino acids at Position +2 10 

The results of Example 7 suggest that the side 
chain of the amino acid at Position +2 of the basic pep- 
tide sequence, ATPAP, having the high GalNAc acceptor 
activity, might affect on the activity. Thus, with regard to 15 
the alanine at Position +2, D- and L-optical isomers 
were prepared and compared for GalNAc acceptor 
activity. The prepared peptides were PAATAAP, 
APATAAP, AAPTAAP and AAATPAP, for which D- and L- 
optical isomers of alanine were formed at Position +2. 20 

FIG. 8 shows the results. As seen from FIG. 8, the 
L-isomers generally showed a higher activity than the 
D-isomer counter parts, although the latter also pro- 
vided a high level of activity. Consequently, it was con- 
firmed that the amino acid at Position +2 may be D- 25 
isomer that is optically symmetric to its natural counter- 
part and the side chain of the amino acid residue at 
Position +2 has little significance in terms of GalNAc 
acceptor activity. 

30 

Example 9 : GalNAc transfer to peptides with different 
amino acids at Position -1 

The results of Example 6 showed that the amino 
acid at Position -1 is important for GalNAc acceptor 35 
activity. Therefore, 20 peptides having different amino 
acids at Position -1 were prepared and compared for 
GalNAc acceptor activity. The prepared peptides were 
AAYTPAP, AAATPAP, AAWTPAP, AASTPAP, AAGTPAP, 
AAVTPAP, AAFTPAP, AATTPAP, AAITPAP, AAHTPAP, 40 
AAMTPAP, AAQTPAP, AACTPAP, AANTPAP, AAPTPAP, 
AALTPAP, AARTPAP, AAETPAP, AADTPAP and AAKT- 
PAP 

FIG. 9 illustrates the results. As seen from FIG. 9, 
the peptides generally show a high GalNAc acceptor 4s 
activity level if proline exists at +1 and +3 regardless of 
the side chain of the amino acid at Position -1 . However, 
the activity level may vary relatively significantly 
depending on the side chain of the amino acid at Posi- 
tion -1. In particular, each peptide having tyrosine, so 
alanine, tryptophan, serine, glycine, valine, phenyla- 
lanine, threonine or isoleucine shows a higher GalNAc 
acceptor activity than that having histidine, methionine, 
glutamine, cysteine, asparagine, proline, leucine, 
arginine, glutamic acid, aspartic acid or lysine. These ss 
results show that the amino acid at Position -1 is not 
charged but aromatic, although the size of the side 
chain has little to do with the GalNAc acceptor activity. 
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Example 10 : GalNAc transfer to peptides with different 
amino acids at Position +4 

The results of Examples 1 through 9 showed that 
the motif of peptide sequence having a high GalNAc 
acceptor activity level is X(-1)-T-P-X(2)-P, where X(-1) 
and X(2) represent any amino acid. In this motif, the 
proline at Position +3 is important and the C-terminal 
side should not be made shorter than it. On the other 
hand, the significance of the amino acid at Position +4 
remains still unknown. Therefore, 20 peptides having 
different amino acids at Position +4 were prepared and 
compared for GalNAc acceptor activity. The prepared 
peptides were AAATPAP, AAATPAPG, AAATPAPQ, 
AAATPAP E, AAATPAPA, AAATPAPN, AAATPAP D, 
AAATPAP R, AAATPAPC, AAATPAP I, AAATPAPV, 
AAATPAP S, AAATPAP K, AAATPAPY, AAATPAP L, 
AAATPAPT, AAATPAPW, AAATPAPM, AAATPAP P, 
AAATPAP F and AAATPAP H. 

FIG. 10 illustrates the results. As seen from FIG. 
10, the amino acid added to Position +4 has little to do 
with the GalNAc acceptor activity and the effect of the 
side chain is even lower if compared with that of the side 
chain for Position -1 or +2. Thus, the amino acid at Posi- 
tion +4 is by no means significant nor essential. How- 
ever, it may have a certain effect in some cases 
because it provides a slightly higher activity level for gly- 
cine, glutamine, glutamic acid, alanine, asparagine, 
aspartic acid, arginine, cysteine and isoleucine. 

Example 1 1 : Alteration of a peptide to the mucin type 
glycoprotein by inserting peptide sequence having Gal- 
NAc acceptor activity 

A peptide sequence having a GalNAc acceptor 
activity was introduced into a protein to confirm that a 
mucin type sugar chain can bind to it. 

As a model protein, a derivative of glutathione S- 
transferase (GST) from Schistosoma japanicum was 
used. The derivative of GST (GST-3X) can easily be 
prepared on a mass production basis from EL cqH with 
commercially available plasmid pGEX-3X (Pharmacia 
Biotech). The derivative of GST had a peptide 
sequence SDLIEGRGIPGNSS added to the C-terminal 
of native GST The gene of the protein contained in 
plasmid pGEX-3X was used. 

A recombinated gene coding for a mutant protein in 
which a peptide sequence having a GalNAc acceptor 
activity was inserted in a downstream region of GST-3X 
was constructed. The construction of the gene is illus- 
trated in FIG. 1 1 . The procedures for gene manipulation 
were according to the methods described in Molecular 
Cloning [J. Sambrook et al., Cold Spring Harbor Labora- 
tory Press (1989)], unless otherwise noted. 

The sequence MAAATPAPM containing AAATPAP 
revealing the high level activity was used as a sequence 
having the GalNAc acceptor activity. The DNA coding 
for the peptide sequence MAAATPAPM was prepared in 
the following manner. The following two single-strand 
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DNAs were prepared with 394DNA/RNA Synthesizer 
available from Applied Biosystems. 
5-AAGGATCCCCATGGCAGCAGCAACGCCG- 
GCACCCATGGGGAATTCAA-3' (Synthesized DNA 1) 
5'-TTGAATTCCCCATGGGTGCCGGCGTTGCT- 
GCTGCCATGGGGATCCTT-3' (Synthesized DNA 2) 

Subsequently 50 n\ of a solution containing 10 mM 
Tns-HCI (pH 8.0), 5 mM MgCI 2 , 100 mM NaCI, 1 mM 2- 
mercaptoethanol and 1 nmol of each of the above syn- 
thesized DNAs were prepared. The solution was then 
warmed to 75°C for 10 minutes and thereafter left to 
room temperature for annealing to produce a double- 
strand DNA, which was the desired DNA. A 5 portion 
of the solution thus obtained was taken and the double- 
strand DNA was cut with EcoR I and BamH I and 
inserted between the same restriction enzyme sites of 
PGEX-3X to construct plasmid pGEX-3X Muc C1 
according to a conventional method. The plasmid con- 
tained a DNA encoding the mutant, GST-3X Muc C1 in 
which MAAATPAPM was inserted between the 228th 
proline and the 229th glycine of the GST-3X. The 
sequence of the inserted region was confirmed by 373 
A DNA sequencer (Applied Biosystems) with 5'pGEX 
Sequencing Primer (Pharmacia Biotech) and PRISM 
Dye Terminator Cycle Sequencing Kit (Applied Biosys- 
tems). 

The mutant protein containing the peptide 
sequence of MAAATPAPM inserted in a C-terminal 
region of GST-3X was prepared by utilizing E. coH in the 
following manner. E. coH BL 21 (Pharmacia Biotech) 
was transformed with pGEX-3X Muc C1 by means of 
the CaCI 2 method and then precultured in 5 ml of 2 x 
YTG culture medium (16 g/1 Tryptone, 10 g/l Yeast 
extract 5 g/l NaCI, pH 7.0) containing 100 ng/ml of amp- 
icilhn, which were precultured with shaking overnight at 
37°C. Subsequently, it was moved to 500 ml of the sim- 
ilar culture medium and cultured with shaking for 2 5 
hours at 37<>C (O.D. =0.5-1 .0). A portion of 100 mM of 
Isopropyl- p-D-thiogalactopyranoside (IPTG) was 
added to the culture solution to achieve a final concen- 
tration of 0.5 mM. The solution was then centrifuged at 
5,000 rpm (4,470 x g) for 10 minutes at 4°C to collect 
cells, which were then washed with 50 ml of 20 mM Tris- 
HCI (pH 7.5) and 140 mM NaCI and subjected to 
another centrifugation under the same conditions for 
collection. The cells were resuspended in 50 ml of 20 
mM Tris-HCI (pH 7.5) and 140 mM NaCI and lyzed with 
a ultrasonic processes The product was centrifuged at 
15,000 rpm (27,700 x g) for 30 minutes at 4°C and the 
supernatant was filtered by a membrane with a pore 
size of 0.22 urn and 10% Triton X-100 was added to 
achieve a final concentration of 0.1%. The obtained 
solution was used as a crude enzyme solution. 

The crude enzyme solution was put on 1 ml of Glu- 
tathione Sepharose 4B column (Pharmacia Biotech) 
which was in advance equilibrated with 20 mM Tris-HCI 
(pH 7.5), 140 mM NaCI and 0.1% Triton X-100 and then 
washed with 20 mM Tris-HCI (pH 7.5), 140 mM NaCI 
and 0.1% Triton X-100. Subsequently, 1 ml of a solution 



containing 50 mM Tris-HCI (pH 8.0), 140 mM NaCI 
0.1% Triton X-100, 5 mM dithiothreitol and 10 mM glu- 
tathione (reduced form) was added to the column 
which was settled for 10 minutes at room temperature 
5 and then eluted with 9 ml of the same buffer solution to 
obtain 1 ml-fractions. The quantification of the protein of 
the eluted fractions was performed with Protein Assay 
Kit (Japan Bio-Rad laboratory) and the GST activity was 
measured by GST Detection Module (Pharmacia Bio- 
10 tech). The gel electrophoresis (SDS-PAGE) of the pro- 
tein was performed by a method proposed by U. K 
Laemmli [Nature (London), Vol.227, pp.680-685 (1970)] 
using 13% gel as a separation gel. The GST-3X Muc C1 
detected in the eluted fractions showed a GST activity 
is equal to that of GST-3X and a single band with a molec- 
ular weight of about 28 K on SDS-PAGE. 

A sample as control GST-3X was also prepared in a 
similar manner as above. 

The GalNAc transfer to GST-3X and GST-3X Muc 
20 C1 was analyzed according to the method described for 
determining the GalNAc acceptor activity of peptide 
except that 5 nmol of the GST-3X mutant was used 
instead of 100 nmol peptide. 

FIG. 12 shows the results. As seen from FIG. 12 no 
25 substantial GalNAc transfer to GST-3X was observed 
whereas GalNAc transfer to GST-3X Muc C1 increased 
as time went by This fact shows that a protein can be 
altered to the mutant that can bind a mucin type sugar 
chain by inserting a peptide sequence having a GalNAc 
30 acceptor activity into the protein. 

Example 12 : Alteration of a peptide to the mucin type 
glycoprotein by adding peptide sequence having Gal- 
NAc acceptor activity 

35 



In this example, a model protein GST-3X was used 
same as in Example 11. However, in this example a 
peptide sequence having a GalNAc acceptor activity 
was added to the N-terminal side of the protein to con- 
40 firm that the protein can be altered to show an ability of 
binding a mucin type sugar chain. 

The construction of a mutant gene was conducted 
according to the procedures illustrated in FIGS. 13 and 
14. 

45 The GST-3X gene in pGEX-3X does not have any 
restriction site in the N-terminal region for inserting a 
DNA fragment of a peptide sequence having a GalNAc 
acceptor activity. Therefore, a gene for GST-3X 2A hav- 
ing a restriction site of Nco I was prepared by polymer- 
50 ase chain reaction (PCR). At first, the following primers 
were synthesized with 394 DNA/RNA Synthesizer avail- 
able from Applied Biosystems. 

5*-GTATCCATGGCCCCTATACTAGGTTATTGG-3' (Syn- 
thesized DNA 3) 

55 5'-TACTGCAGTCAGTCAGTCACGATGAATTCC-3' 
(Synthesized DNA 4) 

The PCR reaction was conducted with a reaction 
solution containing 2.5 ng of template DNA, pGEX-3X 
0.5 vM of Synthesized DNA 3, 0.5 nM of Synthesized 



12 



:<EP 0754703A1> 



23 

DNA 4, 8 nl of dNTP mixture (Takara Shuzo), 10 H of 10 
x AmpliTaq DNA Polymerase Buffer (Takara Shuzo) and 
2.5 unit of AmpliTaq DNA Polymerase (Perkin-Elmer), to 
which two drops of mineral oil (Takara Shuzo) was 
added, and DNA Thermal Cycler (tradename: Perkin- 
Elmer). The PCR process was conducted as 35 cycles 
of 1 minute at 94°C, 2 minutes at 55°C and 2 minutes at 
72°C, followed by a single step of 10 minutes at 72°C 
and a temperature fall to 4°C. After the reaction, Pro- 
nase K (Boehringer Mannheim), ethylenediamine- 
tetraacetic acid disodium salt (EDTA) and sodium 
dodecyl sulfate (SDS) were added respectively to 12 
mg/ml, 10 mM and 0.8%. The mixture was warmed to 
37°C for 30 minutes and then to 65°C for 10 minutes. 
Thereafter, the PCR reaction product was extracted by 
phenol, purified by ethanol precipitation and cut by 
restriction enzymes Nco I and Pst I to produce the 
desired DNA. The DNA was then inserted between the 
same restriction sites of pSL1190 (Pharmacia Biotech) 
according to a conventional method. The sequence of 
the inserted DNA was analyzed by 373A DNA 
Sequencer of Applied Biosystems with PRISM, Dye 
Primer Cycle Sequencing Kit (Applied Biosystems). The 
obtained DNA of GST-3X 2A was characterized in that 
the second amino acid, serine, from the N-terminal of 
GST-3X was changed to alanine by introducing Nco I 
site. 

Then, the GST-3X 2A DNA was cut from pSL1 190 
by Nco I and Pst I, and inserted between the same 
restriction sites of pTrc99A (Pharmacia Biotech) to 
obtain pGEY-3X 2A. The plasmid pGEY-3X 2A is very 
similar to pGEX-3X except that it contains GST-3X 2A 
gene having an Nco I site at the N-terminal where a 
DNA can be inserted and that the promoter is switched 
from tac to trc. 

Finally, a plasmid of pGEY-3X 2A Muc N1 contain- 
ing GST-3X 2A Muc C1 gene having a peptide 
sequence of MAAATPAP at the N-terminal was pre- 
pared in a manner as described below. A gene coding 
for the peptide sequence MAAATPAP was prepared by 
synthesizing single-strand Synthesized DNA 1 and Syn- 
thesized DNA 2 same as those of Example 11 and 
annealing them in 50 nl of a solution containing 50 mM 
Tris-HCI (pH 7.5), 10 mM MgCI 2 . 100 mM NaCI and 1 
mM 2-mercaptoethanol. In 5 nl of the solution thus 
obtained, the double-strand DNA was cut by Nco I and 
inserted into the Nco I site of pGEY-3X 2A to produce 
plasmid pGEY-3X 2A Muc N1. The plasmid has 
MAAATPAP upstream to the methionine at the N-termi- 
nal of GST-3X and contains a DNA coding for a mutant, 
GST-3X 2A Muc N1 , in which the serine at the second 
position of GST-3X had been changed to alanine. The 
sequence of the inserted region was confirmed by 373A 
DNA Sequencer (Applied Biosystems) with 5'-GTT- 
GACAATTAATCATCCGGCTCGT-3' (synthesized and 
purified with HPLC by Kurasiki-Bouseki) and PRISM, 
Dye Terminator Cycle Sequencing Kit (Applied Biosys- 
tems). 

The mutant protein GST-3X 2A Muc N1 was pre- 
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pared by utilizing E coU and analyzed as in the case of 
GST-3X Muc C1 described in Example 1 1 . The GST-3X 
2 A Muc N1 which was detected in an eluted fraction 
from Glutathione Sepharose 4B column showed a GST 

5 activity equivalent to that of GST-3X. Further, it was 
detected as a single band with a molecular weight of 
about 28 K by SDS-PAGE. 

The transfer of GalNAc to GST-3X 2 A Muc N1 and 
to GST-3X was analyzed in the same manner as the 

10 method for measuring the GalNAc acceptor activity 
except that 5 nmol GST mutant was used instead of 100 
nmol peptide. 

FIG. 1 5 shows the results. As seen from FIG. 1 5, no 
substantial GalNAc transfer to GST-3X was observed, 

is whereas GalNAc transfer to GST-3X Muc C1 increased 
as time went by. This suggests that a protein can be 
altered to the protein that can bind a mucin type sugar 
chain by adding a peptide sequence having a GalNAc 
acceptor activity into the protein. 

20 The above results and the results of Example 1 1 
show that, when introducing a peptide sequence having 
a GalNAc acceptor activity into a protein, the position for 
introducing the peptide sequence in the protein is not 
particularly limited. 

25 

Example 13 : Introduction of various peptide sequences 
having GalNAc acceptor activity into a protein to alter 
the mucin type sugar glycoprotein 

30 The transfer of GalNAc to GST-3X Muc C1 in Exam- 
ple 11 and that of GalNAc to GST-3X 2A Muc N1 in 
Example 1 2 were confirmed by a method in which SDS- 
PAGE and fluorography are combined. 

In addition, GST-3X Muc C2, GST-3X Muc C3 and 

35 GST-3X Muc C4 were prepared by using a model pro- 
tein of GST-3X as in the case of Example 1 1 but intro- 
ducing respective peptide sequences that show a 
GalNAc acceptor activity and are different from that of 
GST-3X Muc C1 into the C-terminal side. For these 

40 mutants, GalNAc transfer activities were examined as in 
the case of the preceding two examples. 

The restriction sites in plasmid pGEX-3X have limi- 
tations for modifying the peptide sequence in the C-ter- 
minal region of the model protein. Therefore, plasmid 

45 pGEX-3XS was prepared with PCR for preparing vari- 
ous mutant forms of the model protein. The plasmid was 
constructed in a manner as illustrated in FIG. 16. 

The following primer DNAs were synthesized with 
394DNA/RNA Synthesizer available from Applied Bio- 

50 systems. 

5'-ATGGTACCATGCGCGCCATTACCGAGT-3* (Synthe- 
sized DNA 5) 

5'-CCGAGCTCTGTTTCCTGTGTGAAATTGT-3' (Syn- 
thesized DNA 6) 
55 5'-CAGAGCTCATGTCCCCTATACTAGGTTA-3' (Syn- 
thesized DNA 7) 

5'-GGACTAGTCATGTTGTGCTTGTCAGCTA-3' (Syn- 
thesized DNA 8) 

The PCR process was conducted as described in 
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Example 12 with pGEX-3X as a template DNA and the 
Synthesized DNAs 5 and 6 or Synthesized DNAs 7 and 
8 in combination, provided that 10 x AmpliTaq DNA 
Polymerase Buffer and AmpliTaq DNA Polymerase 
were replaced respectively by PCR buffer, 10 x cone 
with MgS0 4 (Boehringer Mannheim) and Pwo DNA 
polymerase (Boehringer Mannheim). The reaction prod- 
uct by utilizing the combination of Synthesized DNAs 7 
and 8 was applied to agarose gel electrophoresis to col- 
lect about a 0.2kb of DNA fragments, which were then 
extracted with phenol and purified by ethanol precipita- 
tion. The DNA fragments were cut by restriction 
enzymes Sac I and Sph I and inserted between the 
same restriction enzyme sites of pSL1 190 (Pharmacia 
Biotech) according to a conventional method to produce 
PPGST91 . The sequence of the inserted DNA was ana- 
lyzed by 373A DNA Sequencer of Applied Biosystems 
with PRISM, Dye Primer Cycle Sequencing Kit (Applied 
Biosystems). The reaction product by utilizing the com- 
bination of Synthesized DNAs 5 and 6 was subjected to 
a similar process and about a 1 .2 kb DNA fragment was 
isolated and purified, followed by cutting with restriction 
enzymes Sac I and Kpn I. The fragments were inserted 
between the same restriction enzyme sites of pPGST9l 
according to a conventional method to produce 
PPGST92. Then. pPGST92 was cut with restriction 
enzymes EcoR V and Bal I. The obtained 1 .3 kb of frag- 
ments was inserted between the same restriction 
enzyme sites of pGEX-3X according to a conventional 
method to construct pGEX-3XS. 

FIGS. 17A, 17B and 17C respectively shows 
restriction enzyme maps of plasmids pGEX-3XS Muc 
C2. pGEX-3XS Muc C3 and pGEX-3XS Muc C4 which 
are used for expressing GST-3X Muc C2, GST-3X Muc 
C3 and GST-3X Muc C4. 

These plasmids were constructed in a manner as 
described below. 

The following primer DNAs were synthesized 
5'-CGTCTAGACCGTCAGTCAGTCACGAT- 
GAAGGCGCGGGGGTCCCAC-3' (Synthesized DNA 9) 
5'-CGTCTAGACCGTCAGTCAGTCACTAT- 
TAAGGCGCGGGGGTCCCAC-3' (Synthesized DNA 

5-CGTCTAGACCGTCAGTCAGTCACGAT- 
GAAGGCCCGGGGGTCCCAC-3' (Synthesized DNA 

The PCR process was conducted as described 
above with pGEX-3X as a template DNA and the Syn- 
thesized DNAs 7 and 9, Synthesized DNAs 7 and 10 
and Synthesized DNAs 7 and 11 in combination. After 
the reaction, each of the PCR reaction products was 
purified as in the case of Example 12. cut with restric- 
tion enzymes Sac I and Xba I and inserted between the 
same restriction sites of pBluescript II KS+ (Strata- 
gene). The constructed plasmids were respectively 
named as pBGSTC2 for the combination of Synthe- 
sized DNAs 7 and 9. pBGSTC3 for the combination of 
Synthesized DNAs 7 and 1 0 and pBGSTC4 for the com- 
bination of Synthesized DNAs 7 and 1 1. The sequence 



of each of the inserted DNAs was confirmed by 373A 
DNA Sequencer of Applied Biosystems with PRISM 
Dye Primer Cycle Sequencing Kit (-21 M13) and (M13 
Rev.) (Applied Biosystems). Finally, each of pBGSTC2 
s PBGSTC3 and pBGSTC4 was cut with restriction 
enzymes Sac I and EcoR I to obtain about 0.7kb of frag- 
ments. Each of the fragments was inserted between the 
same restriction enzyme sites of pGEX-3XS to provide 
PGEX-3XS Muc C2. pGEX-3XS Muc C3 or pGEX-3XS 
10 Muc C4. 

The mutant proteins GST-3X Muc C2, GST-3X Muc 
C3 and GST-3X Muc C4 were prepared by utilizing E_ 
salt and analyzed in a manner as described for GST-3X 
Muc C1 in Example 1 1 . Each of GST-3X Muc C2 GST- 
15 3X Muc C3 and GST-3X Muc C4 detected in eluted frac- 
tions from Glutathione Sepharose 4B column showed a 
specific GST activity comparable to that of GST-3X Fur- 
ther, they were detected as a single band with a molec- 
ular weight of about 27 K by SDS-PAGE. 
so The transfer of Gal N Ac to each of the mutant GSTs 
was analyzed in the following manner, where SDS- 
PAGE and fluorography was combined. A 50 ul of a 
solution (50 mM Imidazole-HCI (pH 7.2). 10 mM MnCI, 
0.5% Triton X-100 and 150 ^M UDP-[ 3 H]GalNAc) con- 
25 taming 5 nmol of the GST-3X mutant and 50 mU of par- 
tially purified Q-GalNAc T derived from colostrum of 
cow was prepared and kept at 28°C for 20 hours Then 
50 ul of 2 x SDS/sample buffer (125 mM Tris-Hcl (phi 
6.8). 4% SDS. 4% 2-mercaptoethanol. 20% glycerol 
so and 0.004% Bromophenol Blue) was added thereto and 
left in a boiling water for 5 minutes. Thereafter. 30 ul of 
the reaction solution was applied to 12.5% SDS-PAGE 
The gel was then immersed in a fixative solution (2-pro- 
panol/water/acetic acid (25:65:10)) for 30 minutes and 
35 then in Amprify (Amersham) for 30 min. Then the gel 
was vacuum dried at 80°C and in close contacted with 
an X-ray film at -80°C for 15 days for exposure. 

FIG. 18 shows the results. While no GalNAc trans- 
fer to GST-3X was observed, GalNAc was clearly trans- 
40 ferred to GST-3X Muc CI. GST-3X Muc C2 GST-3X 
Muc C3. GST-3X Muc C4 and GST-3X 2A Muc N1 This 
indicates that the sequences represented by the for- 
mula: X(-1)-T-P-X(2)-P wherein X(-l) and X(-2) repre- 
sent any ammo acid functions in a protein have a mucin 
« type sugar chain. It was also found that, for introducing 
a peptide sequence, there are no limitations on the 
region. In addition, there are no limitations on the type of 
the introduction of the peptide sequence, i.e., insertion 
addition and substitution can be appropriately used for 
so introducing the peptide sequence. As clearly seen from 
the case of GST-3X Muc C2 and GST-3X Muc C4, a pro- 
tein capable of accepting a mucin type sugar chain can 
be easily obtained by replacing only two or three amino 
acid residues. This suggests that this technique is fairly 
55 useful for modifying a protein to be glycosylated with 
mucin type sugar chain. 

S ample 14 : Introduction of a peptide sequence having 
GalNAc acceptor activity into a protein and secretory 
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expression of a mutant protein having a mucin type 
sugar chain in eucaryotic cells 

Examples 11 to 13 proved that proteins can be 
altered to the substrate proteins for in vitro GalNAc 
transfer by introducing appropriate peptide sequences 
having a GalNAc acceptor activity. Since E co|i are 
devoid of a biosynthetic pathway of mucin type sugar 
chains, GalNAc should be transferred in vitro to recom- 
binant proteins produced in E. cqIL To the contrary, 
since eucaryotic cells have such a pathway, they may be 
utilized for directly producing mucin type glycoproteins. 
Therefore, in this example, a gene encoding a mutant 
protein including a peptide sequence having a GalNAc 
acceptor activity was prepared and secretory expressed 
in COS7 cells to confirm that the mutant mucin type 
glycoprotein can be produced. 

As in the preceding examples, GST was used as a 
model protein. GST-3X Muc C1 that had been proved to 
be transferred GalNAc and GST-3X being no transferred 
GalNAc in Example 1 1 and 13 were expressed in COS7 
cells. 

Since GST is an intracellular protein, genes of GST- 
3X and GST-3X Muc C1 to which a signal sequence for 
secretion was added to the N-terminal were prepared 
by a 2-step PCR process. The signal sequence of 
human erythropoietin (hEPO) [K. Jacobs et al., Nature, 
Vol.313, pp.806-810 (1985)] was used in a manner as 
described below. 

The following four different primer DNAs were pre- 
pared with 394DNA/RNA Synthesizer of Applied Bio- 
systems. 

5'-AACTCGAGAATTCATGGGGGTGCACGAATG-3' 
(Synthesized DNA12) 
S'-CAATAACCTAGTATAGGGGAGCCCAG- 
GACTGGGAGGCCCA-3' (Synthesized DNA 13) 
S'-TGGGCCTCCCAGTCCTGGGCTC- 
CCCTATACTAGGTTATTG-3' (Synthesized DNA 14) 
5'-CCTCTAGATCGTCAGTCACGTCAGATGAAT-3' 
(Synthesized DNA 15) 

In the first step of the PCR process, a plasmid con- 
taining cDNA of hEPO as a template DNA [H. Ohashi et 
al., Biosci. Biotech. Vol.58, pp.758-759 (1994)] was 
used along with primers of Synthesized DNAs 12 and 
13. The PCR reaction as described in Example 13 was 
carried out except annealing temperature was changed 
to 58°C. A similar PCR reaction was also carried out 
with pGEX-3X as a template DNA and Synthesized 
DNAs 14 and 15 as primers. Each of the reaction prod- 
ucts was extracted once with chloroform and precipi- 
tated with ethanol twice, followed by dissolving in 50 \i\ 
of TE buffer. Template DNAs were prepared by mixing 1 
lil of the signal peptide region and 1 nl of the PCR reac- 
tion product of GST. The second step of the PCR proc- 
ess was carried out with the prepared template DNA 
along with the Synthesized DNAs 12 and 15 as primers. 
The reaction and the purification of the reaction product 
were exactly the same as those in Example 1 3. After the 
product was cut with restriction enzymes Xho I and Xba 



28 

I, the fragment thus obtained was inserted between the 
same restriction sites of pBluescript II KS+ according to 
a conventional method. The obtained plasmid was 
called pBEGST-3X. The sequence of the inserted 

s region of the plasmid was confirmed with PRISM, Dye 
Primer Cycle Sequencing Kit (-21 M31) and (M13 Rev.) 
(Applied Biosystems) in the same manner as described 
in Example 13. 

A gene of a secretion form protein of GST-3X Muc 

10 C1 was prepared in the same manner as described 
above except that pGEX-3X Muc C1 was used instead 
of pGEX-3X. pBEGST-3X Muc C1 was thus obtained. 

The insertion regions from pBEGST-3X and 
pBEGST-3X Muc C1 were cut out with restriction 

is enzymes Xho I and Xba I and collected. Thereafter, they 
were inserted between the same restriction sites of 
plasmid vector pSVL for mammalian cells (Pharmacia 
Biotech) according to a conventional method to give 
pSEGST-3X and pSEGST-3X Muc C1. FIGS. 19A and 

20 19B show restriction maps of the plasmids. It may be 
expected that, when the plasmids are introduced into a 
mammalian cell, EGST-3X and EGST-3X Muc C1 as the 
GST mutants which starts from the second serine from 
the N-terminal of the native GST-3X will be secreted into 

25 the culture. 

Each of pSEGST-3X and pSEGST-3X Muc C1 was 
introduced into COS7 cells (Riken Cell Bank) by electro- 
poration. More specifically, 10 of the plasmid was 
added to about 5 x 10 6 cells in 0.8 ml of PBS(-) (Nissui 

30 Pharmaceutical) and the DNA was introduced with 
Gene Pulser (Japan Biolaboratory) at room temperature 
under the conditions of 1600 V and 25 \iF. The cells 
were put on a 90 mm laboratory dish and cultured in 
Dulbecco's modified Eagle's medium containing 10 ml 

35 of 1 0% fetal bovine serum (Base Catalogue No. 1 2,430) 
(Gibco BRL) at 37°C for 24 hours and thereafter moved 
to 10 ml of Dulbecco's modified Eagle's medium (Base 
Catalogue No. 26,063) (Gibco BRL) at 37°C for 3 days. 
The secreted GST mutant in the culture superna- 

40 tant was purified with 1/2-scaled method, which was 
described in Example 1 1 , concerning the B goli culture 
with 0.5 ml of Glutathione Sepharose 4B column. The 
eluted fraction was condensed and changed to 10 mM 
potassium phosphate buffer (pH 6.2) with a Centricon- 

45 10 (Grace Japan). 

The EGST-3X and EGST-3X Muc C1 thus obtained 
showed a specific GST activity comparable to that of 
GST-3X produced in E coH. They were then subjected 
to an analysis including a treatment with glycosidases 

so and a lectin blotting analysis in the following manner. 

In the treatment with glycosidases, neuraminidase 
derived from Arthrobacter ureafaciens (Boehringer 
Mannheim) and Q-glycanase (Genzyme) derived from 
Diplococcus pneumoniae were used. The treatment 

55 with neuraminidase was conducted by adding 40 mU of 
the enzyme to about 300 ng of the GST mutant in 40 ul 
of 20 mM potassium phosphate buffer (pH 6.2), followed 
by incubating the solution at 37°C for 13 hours. In the 
treatment of neuraminidase and Q-glycanase, the reac- 
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tion solution of neuraminidase described above was 
heated to 37X for 1 hour, and then a 2mU of Q-glyca- 
nase was added thereto before it was left for the reac- 
tion for 12 hours. As controls, a sample kept to 37°C for 
13 hours and an untreated sample were prepared. Each 
of the samples was reacted with SDS by adding a same 
amount of 2 x SDS/sample buffer and a 15 jxl of the 
reaction product was applied to SDS-PAGE. After elec- 
trophoresis, the gel was stained with 2D-silver staining 
reagent II "Daiichi" (Daiichi Pharmaceutical) to detect 
the protein band. 

The lectin blotting analysis was carried out with DIG 
Glycan Differentiation Kit (Boehringer Mannheim) In 
this analysis, the procedure down to the SDS-PAGE 
analysis was the same that of the glycosidase treat- 
ment. Thereafter, the proteins were blotted on a PVDF 
membrane [M. Ogasawara et al., Protein Experiment 
Methods for the Study of Molecular Biology, Youdosha 
1994] and the structure of galactose (p1-3)N-acetylga- 
lacosamine (Gal pi-3 GalNAc) was analyzed with DIG 
labelled lectin PNA. 

The results of the glycosidase treatment are shown 
in FIG. 20. The obtained EGST-3X was detected as a 
substantially single band with a molecular weight of 
about 27 K, which did not change after any glycosidase 
treatment. On the other hand, EGST-3X Muc C1 was 
detected as a single band that was shifted to the high 
molecular weight direction from the anticipated molecu- 
lar weight of 28K for the protein portion thereof. And 
this band shifted to the low molecular weight direction 
by treating with neuraminidase or a combination of neu- 
raminidase and O-glycanase. These facts show that a 
typical mucin type sugar chain is bound to EGST-3X 
Muc CI and that the structure is Gal pi-3GalNAc with 
sialic acids. 

FIG. 21 shows the results of the lectin blotting anal- 
ysis. No band reacting with lectin PNA was detected as 
to EGST-3X regardless of the treatment with glycosi- 
dase. However, the band at about 28 K of EGST-3X Muc 
C1 treated with neuraminidase reacted with the lectin. 
Further, the band disappeared in a sample treated with 
neuraminidase and Q-glycanase. Therefore, it is clear 
that the protein portion of EGST-3X Muc C1 has Gal p1 - 
3GalNAc with sialic acids which is a typical mucin type 
sugar chain. 

From the above findings, protein can be modified to 
glycoprotein having typical mucin type sugar chains 
comprising three or more different types of monosac- 
charides by introducing a peptide sequence having a 
GalNAc acceptor activity and secretory expressing the 
protein in eucaryotic cells. 



Example 1? : Introduction of peptide sequences having 
a GalNAc acceptor activity into a protein and secretory 
expression of the modified protein having a mucin type 
sugar chain in eucaryotic cells 

Example 1 4 showed that a GST mutant obtained by 
inserting a peptide sequence of MAAATPAPM was 



secretory expressed in COS7 cells and the produced 
EGST-3X Muc C1 had a typical mucin type sugar chain 
Thus, in this example, each of GST-3X Muc C2 GST-3X 
Muc C3 and GST-3X Muc C4, that was confirmed to 
5 function, like GST-3X Muc C1 , as a substrate for in vitro 
GalNAc transfer in Example 13, was fused with a signal 
peptide and secretory expressed in COS7 cells to con- 
firm if the expressed proteins EGST-3X Muc C2 EGST- 
3X Muc C3 and EGST-3X Muc C4 can bind mucin type 
10 sugar chains. In addition, EGST-3X Muc C5 having a 
sequence of GTPGNSS, where amino acid at Position 
+1 is proline in the C-terminal region of EGST-3X was 
also prepared. 

For the secretory expression of EGST-3X EGST- 
15 3X Muc C1, EGST-3X Muc C2, EGST-3X Muc C3 
EGST-3X Muc C4 and EGST-3X Muc C5, plasmids of 
pEEGST-3X, pEEGST-3X Muc CI, pEEGST-3X Muc 
C2, pEEGST-3X Muc C3, pEEGST-3X Muc C4 and 
PEEGST-3X Muc C5 were prepared as described 
20 below. FIG. 22 illustrates their restriction maps and the 
ammo acid sequence in the mutated region. 

After cutting pBEGST-3X with restriction enzyme 
Xba I, it was partially digested by restriction enzyme 
EcoR I to produce an about 0.9 kb DNA fragment con- 
25 taming a structural gene with a complete length of 
EGST-3X. The fragment was then inserted between 
EcoR I and Xba I sites of pEF18S [T. Kato et al., J Bio- 
chem, Vol.1 18, pp.229-236 (1995) and S. Mizushima et 
al., Nucleic Acid Res. Vol. 18, 5322(1990)] according to 
30 a conventional method to produce pEEGST-3X 
PEF18S was used in expectation of a high level of 
expression, because the level of the expression of the 
GST mutant in the plasmid vector pSVL was not so high 
in Example 14. 

35 pEEGST-3X Muc C1 was constructed in the same 
manner as described above by using pBEGST-3X Muc 
C1 in place of pBEGST-3X. 

pEEGST-3X Muc C2 was constructed by cutting 
pBGST C2 of Example 13 with restriction enzymes Bal 
40 | and Xba I to produce about an 0.5 kb DNA fragment 
and replacing the region having the same restriction 
sites of pEEGST-3X with the fragment. pEEGST-3X 
MucC3 and pEEGST-3X MucC4 were also constructed 
from pBGST C3 and pBGST C4 of Example 13 respec- 
45 tively. 

PEEGST-3X Muc C5 was constructed in a manner 
as described below. The following primer DNA was syn- 
thesized. 

S'-CGTCTAGACCGTCAGTCAGTCACGAT- 
50 GAATTGCCGGGGGTCCCAC-3' (Synthesized DNA 
16) 

The PCR process as described in Example 13 was 
conducted except that pGEX-3X was used as a tem- 
plate DNA and Synthesized DNA 7 of Example 13 was 
55 combined with Synthesized DNA 16 as primers After 
the reaction, the PCR product was purified by the 
method described in Example 1 2 and cut with restriction 
enzymes Sad and Xab I. The fragment was inserted 
between the same restriction sites of pBluescript II KS+ 
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(Stratagene). The plasmid thus obtained was called Claims 
pBGSTCS. The sequence of the inserted fragment was 
confirmed by 373 A DNA Sequencer available from 1. 
Applied Biosystems with PRISM, Dye Primer Cycle 
Sequencing Kit (-21M13) and (M13Rev.) (Applied Bio- 5 
systems). About a 0.5 kb DNA fragment was obtained 
by cutting pBGSTCS with restriction enzymes Bal I and 
Xba I and substituted with the region having the same 
restriction enzyme sites of pEEGST-3X to construct 
pEEGST-3X Muc C5. 10 

Each of the prepared plasmids of pEEGST-3X, 
pEEGST-3X Muc C1 , pEEGST-3X Muc C2, pEEGST-3X 
Muc C3, pEEGST-3X Muc C4 and pEEGST-3X Muc C5 
was introduced into COS7 cells and the GST mutant 
that was secreted from the cells into the culture was 15 
purified and condensed in a manner as described in 
Example 14. The obtained EEGST-3X, EEGST-3X Muc 2. 
C1, EGST-3X Muc C2, EGST-3X Muc C3, EGST-3X 
Muc C4 and EGST-3X Muc C5 showed a specific activ- 
ity level comparable to that of GST-3X produced by EL 20 
coli . A Part of each sample was analyzed by 13% SDS- 3. 
PAGE and silver staining to detect protein bands. 

FIG. 23 shows the results. EGST-3X which was 
proved to have no sugar chain in Example 14 showed a 4. 
single band at about 27 K. In the case of EGST-3X Muc 25 
C1 , on the other hand, the major signal was found at the 
positions proved to be binding a mucin type sugar chain 5. 
in Example 14 and the minor signal was found at about 
28k which is corresponding to the protein without sugar 
chains. Similarly, all of EEGST-3X Muc C2, pEEGST-3X 30 6. 
Muc C3 and pEEGST-3X Muc C4 gave bands corre- 
sponding to proteins having a mucin type sugar chain 
similar to the case of EGST-3X Muc C1 . To the contrary, 
EGST-3X Muc C5 gave the major band corresponding 
to protein without sugar chains and the minor band cor- 35 7. 
responding to protein having a mucin type sugar chain. 

The above results proved that a protein can be 
modified into a typical glycoprotein having a mucin type 8. 
sugar chain by introducing any of various peptide 
sequences having a GalNAc acceptor activity into the 40 
protein and expressing the protein in an eucaryotic cell. 
More specifically, the results make it clear that various 
peptide sequences having a GalNAc acceptor activity 9. 
described in Examples 1 to 10 can also function in a 
protein as the acceptor very well through in vivo biosyn- 45 
thesis pathway of mucin type sugar chains in eucaryotic 
cells. In addition, the results of the introduction of pep- 
tide sequences prove that this technique is fairly useful 
because the modificaiton of a mucin sugar chain needs 
the alteration of only one to three amino acid residues so 
including sugar chain binding site in a protein. Further- 
more, the comparison of EGST-3X Muc C5 having a 
proline only at Position +1 and EGST-3X Muc C4 having 
prolines at both Position +1 and +3 suggests that the lat- 
ter having a stronger GalNAc acceptor activity can be ss 
used more advantageously for the efficient glycosyla- 
te in producing glycoproteins having mucin type sugar 
chains in vivo . 



A protein or peptide comprising a sequence repre- 
sented by formula (I): 



X(-1)-X(0)-X(1)-X(2)-X(3) 



(I) 



X(-1) and X(2) represent independently any 
amino acid, 

X(0) represents threonine (T) or serine (S), 
X(1) and X(3) represent independently any 
amino acid provided that at least one of X(1) 
and X(3) represents proline (P). 

A protein or peptide according to claim 1, wherein 
X(1) or X(3) represents P or A and at least either 
one of them represents P. 

A protein or peptide according to claim 1 or 2, 
wherein X(1) represents P. 

A protein or peptide according to claim 1 or 2, 
wherein X(3) represents P. 



A protein or peptide according to claim 1 or 2, 
wherein X(1) and X(3) represent R 

A protein or peptide according to any one of claims 
1 to 5, wherein X(-1) represents Y, A, W, S, G, V, F, 
T or I and X(2) represents A, P, C, K, R, H, S, M, T, 
Q, V, I, L or E. 

A protein or peptide according to any one of claims 
1 to 6, wherein X(0) represents T. 

A protein or peptide sequence according to any one 
of claims 1 to 7, wherein one or two amino acids 
additionally exist to the N-terminal side of X(-1), the 
amino acids being A, P, G, E, Q, T, R or D. 

A peptide according to claim 1 and selected from 
the following: 
ATPAP, 
AATPAP, 
AAATPAA, 
AAATAAP, 
PAATAAR 
APATAAP, 
AAPTAAP, 
AAATAPP. 
PAATPAP, 
APATPAP, 
AAXaTPXbP, 
where Xa and Xb represent any amino acid but 
either Xa or Xb represents A, and 

AAATPAPXc, 
where Xc represents any amino acid. 
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10. A protein or peptide according to claim 1, wherein 
the protein or peptide comprising any of the 
sequences according to claim 9. 



11 



Use of a protein or peptide according to any of 
claims 1 to 10 for preparing a protein or peptide as 
a substrate for UDP-GalNAc: polypeptide a1 O- 
GalNAc transferase (O-GalNAc T) and comprising 
Gal N Ac introduced therein. 



12. A method of introducing a mucin type sugar chain 
into a protein or peptide comprising the steps of: 

providing a protein or peptide according any of 
claims 1 to 10; and 

reacting the protein or peptide as a substrate 
with a mucin type sugar chain, thereby the 
mucin type sugar chain is introduced into the 
protein. 

13. A method of introducing GalNAc into a protein or 
peptide comprising the steps of: 

providing a protein or peptide according to any 
one of claims 1 to 10; and 
reacting the protein or peptide as a substrate 
with UDP-GalNAc (where UDP represents uri- 
dine S'-diphosphate and GalNAc represents N- 
acetylgalactosamine) in the presence of UDP- 
GalNAc: polypeptide a1 ,Q-GalNAc transferase 
(Q-GalNAc T). 
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14. A method according to claim 12 or 13, wherein a 

protein or peptide according to any one of claims 1 

to 10 is prepared by DNA recombination or chemi- 35 
cal synthesis. 

15. A protein or peptide binding a mucin type sugar 
chain obtained by a method according to any one of 
claims 12 to 14. 

40 

1 6. A method of preparing a protein or peptide having a 
mucin type sugar chain comprising the steps of: 

transforming an eucaryotic cell with a DNA 
coding for a protein or peptide according to any 
one of claims 1 to 10; and 
expressing the protein or peptide in the trans- 
formed cell and secreting the protein or peptide 
from the cell. 

17. A method of introducing a mucin type sugar chain 
into a target position of a protein or peptide of inter- 
est comprising the steps of: 

inserting or adding a DNA coding for a 
sequence represented by formula (I) defined in 
claim 1 into a position which is in a DNA coding 
for the protein or peptide of interest and is cor- 



responding to the position where a mucin type 
sugar chain is intended to be introduced, or 
replacing a partial DNA fragment including the 
position with a DNA coding for a sequence rep- 
resented by formula (I) defined in claim 1, 
thereby a DNA coding for a protein or peptide 
containing the DNA coding for the sequence 
represented by formula (I) is obtained; 
transforming an eucaryotic cell with the DNA 
obtained in the above step; and 
expressing the protein or peptide in the trans- 
formed cell and secreting the protein or peptide 
having a mucin type sugar chain from the cell. 

18. A method according to claim 16 or 17, wherein the 
eucaryotic cell is a mammalian cell. 

19. A method according to claim 16 or 17, wherein the 
DNA coding for the protein or peptide containing 
the DNA coding for a sequence represented by for- 
mula (I) is in the form of a vector. 

20. A protein or peptide having a mucin type sugar 
chain obtained by the method according to claim 16 
or 17. 

21. A DNA sequence coding for a sequence repre- 
sented by formula (1) defined in claim 1 . 

22. A DNA sequence coding for a protein or peptide 
according to any one of claims 1 to 10. 

23. A DNA sequence according to claim 21 or 22 and 
inserted in a vector. 
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